{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GPU计算\n",
    "\n",
    "到目前为止，我们一直在使用CPU计算。对复杂的神经网络和大规模的数据来说，使用CPU来计算可能不够高效。在本节中，我们将介绍如何使用单块NVIDIA GPU来计算。首先，需要确保已经安装好了至少一块NVIDIA GPU。然后，下载CUDA并按照提示设置好相应的路径（可参考附录中[“使用AWS运行代码”](../chapter_appendix/aws.ipynb)一节）。这些准备工作都完成后，下面就可以通过`nvidia-smi`命令来查看显卡信息了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "1"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Thu Nov  7 15:19:36 2019       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  GeForce RTX 208...  On   | 00000000:18:00.0  On |                  N/A |\n",
      "| 45%   44C    P8    16W / 280W |     31MiB / 10989MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   1  GeForce RTX 208...  On   | 00000000:AF:00.0  On |                  N/A |\n",
      "| 45%   42C    P8     5W / 280W |     11MiB / 10989MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                       GPU Memory |\n",
      "|  GPU       PID   Type   Process name                             Usage      |\n",
      "|=============================================================================|\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi  # 对Linux/macOS用户有效"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，我们需要确认安装了PyTorch的GPU版本。安装方法见[“获取和运行本书的代码”](../chapter_prerequisite/install.ipynb)一节。运行本节中的程序需要至少2块GPU。\n",
    "\n",
    "## 计算设备\n",
    "\n",
    "PyTorch可以指定用来存储和计算的设备，如使用内存的CPU或者使用显存的GPU。默认情况下，PyTorch会将数据创建在内存，然后利用CPU来计算。在PyTorch中，`torch.device('cpu')`（或者在括号里填任意整数）表示所有的物理CPU和内存。这意味着，MXNet的计算会尽量使用所有的CPU核。但`torch.device('cuda')`只代表一块GPU和相应的显存。如果有多块GPU，我们用`torch.cuda.device_count()`来获得GPU数量，之后设置`os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"i\"`（$i$从0开始）来指定全局使用的默认GPU，或者设置`device`属性为`cuda:i`来指定当前使用的GPU。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(device(type='cpu'), device(type='cuda'))"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "torch.device('cpu'), torch.device('cuda')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Tensor`的GPU计算\n",
    "\n",
    "在默认情况下，`Tensor`存在内存上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1., 2., 3.])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = torch.Tensor([1, 2, 3])\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以通过`Tensor`的`device`属性来查看该`Tensor`所在的设备。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "device(type='cpu')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GPU上的存储\n",
    "\n",
    "我们有多种方法将`Tensor`存储在显存上。例如，我们可以在创建`Tensor`的时候通过`device`参数指定存储设备。下面我们将`Tensor`变量`a`创建在`cuda`上。注意，在打印`a`时，设备信息变成了`cuda:0`。创建在显存上的`Tensor`只消耗同一块显卡的显存。我们可以通过`nvidia-smi`命令查看显存的使用情况。通常，我们需要确保不创建超过显存上限的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "5"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1., 2., 3.], device='cuda:0')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = torch.Tensor([1, 2, 3]).to('cuda')\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设至少有2块GPU，下面代码将会在`cuda:1`上创建随机数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0.6489, 0.3837, 0.8226],\n",
       "        [0.1939, 0.6765, 0.7554]], device='cuda:1')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B = torch.rand(2, 3, device='cuda:1')\n",
    "B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了在创建时指定，我们也可以通过`to`函数在设备之间传输数据。下面我们将内存上的`Tensor`变量`x`复制到`cuda`上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1., 2., 3.], device='cuda:0')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = x.to('cuda')\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1., 2., 3.], device='cuda:0')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z = x.to('cuda')\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要区分的是，如果源变量和目标变量的`device`一致，`to`函数使目标变量和源变量共享源变量的内存或显存。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "8"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.to('cuda') is y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GPU上的计算\n",
    "\n",
    "PyTorch的计算会在数据的`device`属性所指定的设备上执行。为了使用GPU计算，我们只需要事先将数据存储在显存上。计算结果会自动保存在同一块显卡的显存上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "9"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([ 20.0855, 109.1963, 445.2395], device='cuda:0')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(z + 2).exp() * y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意，PyTorch要求计算的所有输入数据都在内存或同一块显卡的显存上。这样设计的原因是CPU和不同的GPU之间的数据交互通常比较耗时。因此，PyTorch希望用户确切地指明计算的输入数据都在内存或同一块显卡的显存上。例如，如果将内存上的`Tensor`变量`x`和显存上的`Tensor`变量`y`做运算，会出现错误信息。当我们打印`Tensor`或将`Tensor`转换成NumPy格式时，如果数据不在内存里，PyTorch会将它先复制到内存，从而造成额外的传输开销。\n",
    "\n",
    "## 模型的GPU计算\n",
    "\n",
    "同`Tensor`类似，PyTorch的模型可以在初始化时通过`to`函数转移到指定设备。下面的代码将模型参数初始化在显存上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "12"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sequential(\n",
       "  (0): Linear(in_features=3, out_features=1, bias=True)\n",
       ")"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = nn.Sequential(\n",
    "    nn.Linear(3, 1)\n",
    ")\n",
    "net.to('cuda')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当输入是显存上的`Tensor`时，PyTorch会在同一块显卡的显存上计算结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "13"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.8643], device='cuda:0', grad_fn=<AddBackward0>)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们确认一下模型参数存储在同一块显卡的显存上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "14"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.4655,  0.4900, -0.1772]], device='cuda:0')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "* PyTorch可以指定用来存储和计算的设备，如使用内存的CPU或者使用显存的GPU。在默认情况下，PyTorch会将数据创建在内存，然后利用CPU来计算。\n",
    "* PyTorch要求计算的所有输入数据都在内存或同一块显卡的显存上。\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 试试大一点儿的计算任务，如大矩阵的乘法，看看使用CPU和GPU的速度区别。如果是计算量很小的任务呢？\n",
    "* GPU上应如何读写模型参数？\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## 参考文献\n",
    "\n",
    "[1] CUDA下载地址。 https://developer.nvidia.com/cuda-downloads\n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/988)\n",
    "\n",
    "![](../img/qr_use-gpu.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:pytorch]",
   "language": "python",
   "name": "conda-env-pytorch-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
